智能论文笔记

Scalable Pathogen Detection from Next Generation DNA Sequencing with Deep Learning

Sai Narayanan , Sathyanarayanan N. Aakur , Priyadharsini Ramamurthy , Arunkumar Bagavathi , Vishalini Ramnath , Akhilesh Ramachandran

分类：机器学习

2022-11-30

Next-generation sequencing technologies have enhanced the scope of Internet-of-Things (IoT) to include genomics for personalized medicine through the increased availability of an abundance of genome data collected from heterogeneous sources at a reduced cost. Given the sheer magnitude of the collected data and the significant challenges offered by the presence of highly similar genomic structure across species, there is a need for robust, scalable analysis platforms to extract actionable knowledge such as the presence of potentially zoonotic pathogens. The emergence of zoonotic diseases from novel pathogens, such as the influenza virus in 1918 and SARS-CoV-2 in 2019 that can jump species barriers and lead to pandemic underscores the need for scalable metagenome analysis. In this work, we propose MG2Vec, a deep learning-based solution that uses the transformer network as its backbone, to learn robust features from raw metagenome sequences for downstream biomedical tasks such as targeted and generalized pathogen detection. Extensive experiments on four increasingly challenging, yet realistic diagnostic settings, show that the proposed approach can help detect pathogens from uncurated, real-world clinical samples with minimal human supervision in the form of labels. Further, we demonstrate that the learned representations can generalize to completely unrelated pathogens across diseases and species for large-scale metagenome analysis. We provide a comprehensive evaluation of a novel representation learning framework for metagenome-based disease diagnostics with deep learning and provide a way forward for extracting and using robust vector representations from low-cost next generation sequencing to develop generalizable diagnostic tools.

translated by 谷歌翻译

在此演示论文中，我们设计和原型Rhythmedge是一种低成本，基于深度学习的无接触系统，用于常规的HR监控应用。通过促进无接触性质，实时/离线操作，廉价和可用的传感组件以及计算设备，节奏对现有方法的好处。我们的Rhythmedge系统是可移植的，可以轻松部署，以在中等控制的室内或室外环境中可靠的人力资源估计。 Rhythmedge通过检测面部视频（远程光摄影学； RPPG）的血量变化来测量人力资源，并使用现成的市售资源可限制的边缘平台和摄像机进行即时评估。我们通过将Rhythmedge的可伸缩性，灵活性和兼容性部署到不同的体系结构的三个资源约束平台上（Nvidia Jetson Nano，Google Coral Development Board，Raspberry Pi）和三个异质摄像机，可与不同的体系结构进行部署，并证明了Rhythmedge的可伸缩性和兼容性。摄像头，动作摄像头和DSLR）。 Rhythmedge进一步存储纵向心血管信息，并为用户提供即时通知。我们通过分析其运行时，内存和功率使用情况来彻底测试三个边缘计算平台的原型稳定性，延迟和可行性。

translated by 谷歌翻译

使用机器学习（ML）语言模型（LMS）来监视内容在线上升。对于有毒文本识别，使用由注释器标记的数据集来执行任务特定的微调，这些模型是在努力区分攻击性和正常内容之间的基础标签的数据集。这些项目随着时间的推移，大型数据集的开发，改进和扩展，并对自然语言进行了贡献。尽管取得了成就，但现有的证据表明，在这些数据集上建立的ML模型并不总是导致理想的结果。因此，使用设计科学研究（DSR）方法，该研究审查了选定的有毒文本数据集，其目标是在一些内在的问题上脱落，并有助于讨论导航现有和未来项目的这些挑战。为了实现该研究的目标，我们重新注释了来自三个有毒文本数据集的样本，并发现一个用于注释有毒文本样本的多标签方法可以有助于提高数据集质量。虽然这种方法可能不会改善互联网间协议的传统指标，但它可能更好地捕获对注释器中的上下文和多样性的依赖。我们讨论了这些结果对理论和实践的影响。

translated by 谷歌翻译

两种样本测试评估两个样品是否是相同分布（零假设）或两种不同分布（替代假设）的实现。在传统的本问题的制定中，统计学家可以访问测量（特征变量）和组变量（标签变量）。但是，在几个重要的应用程序中，可以轻松测量特征变量，但二进制标签变量是未知的并且获得昂贵的。在本文中，我们考虑了经典的两个样本测试问题的这一重要变化，并将其构成，作为在执行两个样本测试的服务中仅获得少量样品的标签的问题。我们设计了一个标签高效的三阶段框架：首先，分类器培训，采用均匀标记为模拟标签的后验概率;其次，将一个创新的查询计划被称为\ emph {bimodal查询}用于查询来自两个类别的样本标签，最大的后验概率，最后，对查询样本进行了经典的弗里德曼-RAFSKY（FR）两样测试。我们的理论分析表明，在合理的条件下，双峰查询对于FR测试是最佳的，并且三阶段框架控制I误差。对合成，基准和应用程序特定数据集进行的广泛实验表明，三阶段框架在控制I错误的统一查询和确定的基于标签上的统一查询和确定性的查询中的II型误差减少。

translated by 谷歌翻译